Graph Attention Informer for Long-Term Traffic Flow Prediction under the Impact of Sports Events

Traffic flow prediction is one of the challenges in the development of an Intelligent Transportation System (ITS). Accurate traffic flow prediction helps to alleviate urban traffic congestion and improve urban traffic efficiency, which is crucial for promoting the synergistic development of smart transportation and smart cities. With the development of deep learning, many deep neural networks have been proposed to address this problem. However, due to the complexity of traffic maps and external factors, such as sports events, these models cannot perform well in long-term prediction. In order to enhance the accuracy and robustness of the model on long-term time series prediction, a Graph Attention Informer (GAT-Informer) structure is proposed by combining the graph attention layer and informer layer to capture the intrinsic features and external factors in spatial–temporal correlation. The external factors are represented as sports events impact factors. The GAT-Informer model was tested on real-world data collected in London, and the experimental results showed that our model has better performance in long-term traffic flow prediction compared to other baseline models.


Introduction
Traffic flow prediction is a core component of an Intelligent Transportation System (ITS) [1].An ITS integrates people, vehicles, and roads into a comprehensive consideration so that they can work closely together to achieve a synergistic effect [2].Traffic flow prediction is a rather complex process with multiperiod features, which is affected by a variety of factors such as traffic patterns, abnormal events, bad weather, and data collection [3].For the government, traffic flow prediction mainly serves at the macro level, such as in urban master planning, urban transportation development policy, and urban transportation professional planning [4].Accurate and efficient traffic flow prediction helps to alleviate urban traffic congestion and improve urban transportation efficiency [5,6].For individual travelers, traffic flow prediction is an important reference for path planning [7], which is helpful for people to plan their travel routes in advance to avoid getting into traffic congestion.
In modern society, traffic congestion is a common problem [8,9].Traffic congestion not only causes inconvenience to people's traveling, but it also may produce a series of problems for some specific occasions, such as sports events.There have been some reports of teams not being able to get to the stadium on time due to traffic congestion, thus causing games to be postponed.For fans, the delay has forced them to readjust their follow-up plans, and for those who traveled from afar, this may cause more inconvenience.During largescale sports events, urban traffic will see a surge in urban road traffic and public transport passenger flow [10].Compared to normal urban traffic planning, the traffic organization during large-scale sports events has unique travel and service characteristics [11]: it requires safe, smooth, reliable, and high-level services for the event traffic and must ensure the normal operation of the host city's traffic system at the same time.
With the arrival of the era of big data, the development of big data's enabling of transportation has attracted wide attention [12].Based on a better traffic flow prediction result, the traffic police can conduct timely traffic diversion to ease congestion caused by excessive traffic.While companies such as cabs, online car-hailing and bike-sharing can conduct vehicle scheduling to safeguard the public's need for transportation.Short-term prediction provides accurate forecasting from minutes to hours, thus benefiting applications such as real-time traffic control and traveler information systems.This includes practices like adjusting traffic signal timings based on real-time data, rapid response to incidents such as accidents and breakdowns to minimize disruptions, and providing real-time updates and rerouting suggestions through navigation systems.Meanwhile, long-term prediction offers forecasts over weeks to months, which is helpful for developing traffic management plans for special events to handle temporary surges.In this case, how to effectively use big data to make good traffic flow prediction, especially under the impact of special events, has become a new challenge.
In terms of the development process of traffic flow prediction, the methods can be categorized into classical methods and deep learning methods [13].Among the classical methods, models based on statistical methods have better interpretability, such as the Historical Average Model (HA) [14], the Autoregressive Integrated Moving Average (ARIMA) [15], and the Kalman filter [16].Models based on traditional machine learning methods are more flexible, such as the Support Vector Machine (SVM) [17], K-Nearest Neighbor (KNN) [18], and Hidden Markov Model (HMM) [19].However, classical methods are difficult to characterize the deep nonlinear relationships in spatiotemporal series, and they are not ideal for predicting data with high data perturbation.In recent years, with the development of deep learning, more and more researchers tend to consider deep learning methods, such as the Convolutional Neural Network (CNN) [20], Recurrent Neural Network (RNN) [21], Graph Convolutional Network (GCN) [22], Gate Recurrent Unit (GRU) [23], Long Short-Term Memory (LSTM) [24], and their variants.These models perform well in short-term prediction, but their accuracy and stability are not satisfactory for long-term prediction.
Compared to the above analysis, we propose a novel deep neural network model, which can effectively combine the spatiotemporal information of traffic maps and accurately realize long-term traffic flow prediction.The contributions are as follows:

•
To the best of our knowledge, this is the first study to consider the impact of major sports events in traffic flow prediction.

•
The Graph Attention Informer (GAT-Informer) structure is firstly proposed to address the long-term traffic flow prediction problem by combining the Graph Attention Network and the Informer Network.

•
In addition to using the classical dataset, another dataset used to verify and test GAT-Informer is newly collected real-world data containing real sports events.This dataset has not been applied to other published articles.
The rest of the work is organized as follows: Section 2 issues the problem and describes the development of traffic flow prediction methods.Section 3 first gives the quantitative criteria for the impact of sports events on traffic flow; it then follows with the description of the model structure.Section 4 presents the detailed experimental results and the visualization of model performance comparisons with analysis.Section 5 summarizes the conclusions and the outlook for future research.

Problem Statement
Traffic flow is the number of vehicles passing through a spatial unit (e.g., a roadway segment or a traffic detector) in a given time period [25].Traffic flow prediction can be categorized into short-term prediction and long-term prediction according to the prediction length.Long-term traffic flow prediction corresponds to transportation planning, while short-term traffic flow prediction corresponds to traffic guidance, traffic management, and control.There is no clear criterion for which category it belongs to, and it is generally believed that a prediction length of 5-30 min is short-term, while more than 30 min is long-term [26,27]; some researchers also define 1 day and above as long-term [28].In this paper, the prediction length of 30 min or less is regarded as a short-term prediction.For a large-scale sports event, the traffic department needs to do traffic planning in advance so as to establish an efficient and coordinated departmental communication and traffic command system to ensure the effective implementation of the various stages of planning and orderly operation.

Classical Prediction Model
Without considering traffic maps, traffic flow prediction problems can be reduced to a time series analysis.ARIMA is the most common time series prediction model in statistical modeling [29].It contains Autoregressive (AR), Integrated (I), and Moving Average (MA) models.The AR model captures historical trends and makes forecasts accordingly, but it is not ideal to deal with data with sudden changes.The I model performs linear subtraction of data with equal period intervals to smooth the data.The MA model makes up for the shortcomings of the AR model and is used to deal with noisy terms.Many variants of ARIMA have been proposed to continuously optimize the model implementation over time.Seasonal ARIMA [30] supports time series data with seasonal components, Subset ARIMA [31] improves the accuracy of short-term prediction tasks, and FARIMA [32] improves the accuracy of long-term prediction tasks.However, these models require data with high smoothness and a linear relationship, which makes it difficult to deal with data containing special events such as sports events.

Deep Learning-Based Prediction Model
In order to utilize the advantages of each neural network to realize a better prediction performance, a combination of two or more models is generally used.With the proposal of an attention mechanism in the Transformer [33], many researchers applied it to traffic flow prediction.DNN-BTF [7] applies an attention mechanism to automatically learn the importance of past traffic flow information and uses a CNN to mine the spatial features, as well as an RNN to mine the temporal features of traffic flow.GCN-DHSTNet [34] dynamically learns the spatiotemporal features of traffic crowd flow data on a citywide scale and simultaneously predicts the traffic flow in each area.STGCN [26] is the first to apply graph convolution to the traffic prediction problem.Based on this, ASTGCN [35] introduces an attention mechanism to capture spatiotemporal correlations.GMAN [36] predicts the traffic conditions at different locations at different time steps on a road network graph.DMSTGCN [37] devises a dynamic graph constructor and a dynamic graph convolution method to propagate the hidden states of the nodes based on dynamic spatial relationships.STFGCN [38] efficiently learns the hidden spatiotemporal dependencies using a novel fusion operation of various spatial and temporal graphs to process different time steps in parallel.
In conclusion, motivated by related works and existing problems, the Graph Attention Informer (GAT-Informer) network for long-term traffic prediction, which combines both the spatial and temporal correlations of each detector in the traffic communication network while satisfying the high accuracy requirement of long-term time series prediction is proposed in this paper.In the next section, the structure of the entire model will be illustrated in detail.

Problem Formulation
At time t, the input data contain the traffic flow information of each detector, X t = [x 1 , x 2 , . . ., x n ], where X t ∈ R n×F , x a ∈ R F refers to the ath input node with a feature dimension F, and n is the number of total detectors.A graph is the basic structure of a Graph Neural Network (GNN) [39], and the traffic map is a special type of graph.It can be defined as follows: where V is the set of nodes, E is the set of edges, and A is the adjacency matrix.Node set V with n nodes contains the traffic flow information for each node, and different nodes are connected by the edge set E with a link adjacency matrix A. We consider a directed graph, and the node adjacency is described in the adjacency matrix A indicating the connectivity between nodes.
The main purpose of the graph-based traffic flow prediction is to predict the future N step traffic flow, which can be described as follows: where X is the historical traffic flow data, Y is the future traffic flow value, f is the relationship function, M is the time step of historical data usage, and N is the prediction horizon.The effect of sports events is a dynamic factor which variates by time.An impact factor σ is utilized to denote the effect of sports events in the temporal domain.At time step t, the impact factor is described as σ t ∈ R n and the M steps σ ∈ R M×n .With the consideration of sports events effect, the prediction can be further expressed as: In conclusion, the issue of traffic flow prediction under the impact of sports events is to find the relationship function f , which will be realized by our GAT-Informer model.In the next subsection, the mathematical expression of the external factor that we considered in this paper will be introduced in detail.

Definition of Sports Events Impact Factor
The impact of sports events is generally temporary, and it may cause a spike in traffic flow around the arena.It usually occurs before the start of the game and after the end of the game, which indicates a large number of people entering the vicinity of the arena and a large number of people leaving the vicinity of the arena, respectively.Hence, we can assume that sports events have a weak impact on traffic flow on other days.In this paper, we consider both of the temporal and spatial effects of sports events, since they are also geographically limited.
Based on these assumptions, we use a function with a shape similar to the Gaussian distribution to represent the extent to which sports events affect traffic flow during this special period.In order to make the curve smoother and cover more time periods, we choose the random variable Z to obey the Gaussian distribution with mathematical expectation µ and variance Var(Z), which is written as Z ∼ N(µ, Var(Z)).Furthermore, the probability density function is amplified by Var(Z) to enhance its amplitude and enlarge the influence of impact factor σ. In order to better express the continuous impact on traffic flow before and after the game, a maximum operation is applied to these two distribution functions.Thus, the sports event's impact factor σ can be calculated as follows: where t start is the start time of the game; t end is the end time of the game; z is the sampling time point.As most people will choose to enter the stadium ahead of the start time and stay for a while after the game before leaving, we use ∆t to represent this situation.The sports events impact factor σ is illustrated in Figure 1.Given that the geographical impact of sports events is significant, we introduce a geographical factor β that follows an amplitude-enhanced two-dimensional Gaussian distribution.For node i, this factor can be expressed as follows: where (x i , y i ) represents the geographical location of node i, and (x 0 , y 0 ) denotes the arena location where the sports event is held.The X and Y directions share the same variance τ 2 .Under the geographical effect factor β, σ i for node i can be further derived as follows:

GAT-Informer Architecture
In this subsection, the structure of the GAT-Informer model is introduced in detail.GAT-Informer contains two layers: the Graph Attention layer and the Informer layer.The Graph Attention layer is used to fuse neighbor node information of the input data, so as to capture their spatial features, and then output the time series data with fused neighbor features.The Informer layer is used to process this time series data to achieve long-term prediction.The overview of the GAT-Informer architecture is illustrated in Figure 2. External factors will be introduced into both the GAT layer and the Informer layer for external factor fusion.

Graph Attention Layer
Graph Attention Network (GAT) [40] is a special neural network designed for processing graph-structured data.Unlike traditional neural networks, GAT takes into full consideration the relationships between data during processing, which enables the ability to capture the correlations between data more accurately without manual presetting.When there is a sports event, the adjacency relationships of the traffic network will change, thus rendering the pre-established road adjacency matrix less effective.That is why GAT stands out from other graph neural networks.It is capable of recognizing dynamic node adjacencies and capturing real-time correlations by calculating attention scores for adjacent nodes.
The input data of the Graph Attention layer contains three parts: a set of node features, the adjacency matrix, and the external impact factor for the consideration of sports events in the spatial information extraction.Node features can be described as an , where n is the number of detectors in the traffic map, and F is the number of input features for each node.Since GAT is a layered strucutre, the input node features of the first GAT layer would be the traffic flow observation X.The distance between the detectors can be calculated from the longitude and latitude information in the traffic map, and then the adjacency matrix can be obtained by combining the connection relationship between them.The traffic map with spatial connectivity of the detectors is shown in Figure 3.This traffic map shows the connectivity and relative position of the nodes, but it does not represent the true spatial layout and shape of the road.The entire framework of Graph Attention layer is shown in Figure 4.As discussed above, the input to this layer contains an F × n matrix (set of node features) and an n × n matrix (adjacency matrix).W and A are both linear transformation matrices with shared parameters and are used for dimension transform.For node i and the neighbouring node j, attention score e ij can be calculated as follows: where || means concatenation.N i refers to neighboring nodes for node i.
Although the adjacency matrix is not explicitly present in the core formulas, its role is implicitly crucial.The adjacency matrix defines the relationships between nodes, and only adjcent nodes would be considered when calculating e ij .In this paper, we employ a layered strucutre of GAT to perform feature aggregation, and each layer of GAT aggregates information from the direct neighbors of the nodes.
The aggregation of node features is enhanced using a multihead attention mechanism and the sports events impact factor σ. Final output of node i can be expressed as follows: where e k ij are normalized attention scores for the kth head of attention.W k is the corresponding linear input transformation, and φ(x) denotes a nonlinear activation function.hj refers to the average value of h j .
During the feature aggregation, signal from nodes affected by sports events would be enhanced on the basis of their distance to the arena.Final output matrix H remains in the form F × n and contains all aggregated nodes.

Informer Layer
Informer is a variant of the Transformer, which achieves higher efficiency than the Transformer in long-sequence time series forecasting [41].In the scenario of traffic prediction considering sports events, the foreacasting needs to cover a long time span, and the Informer is famous for its ability of long-term prediction and a lower computational cost compared to the Transformer, which enables it to better capture the temporal dependency in a longer prediction horizon.The Informer has an encoder-decoder structure, and the architecture of the Informer layer is shown in Figure 5.The input of the encoder contains two parts: one is the same sequences as the output sequences from the Graph Attention layer H; the other one is the sports events impact factor defined in Section 3.2.For node i, the time series impact factor σ i would be regarded as a temporal indicator and concatenated with the Informer input sequence h i ∈ R M , which can be expressed as follows: By concatenating these two together, each time period will be given a label indicating whether the traffic flow was affected by sports events at that time and the extent to which it was affected.The input of the decoder is similar to the input of the encoder; it also consists of two parts: the main difference is that the input sequences are no longer taken exclusively from the dataset but contain a portion of the data to be predicted, which are initialized by 0. The advantage of this configuration is that the real data in the dataset can guide the predict data, thus improving the accuracy of prediction.The main function of the Informer layer is to assign values to these predict data to obtain the output sequences.The core idea of the Informer layer is a ProbSparse attention mechanism.The general attention mechanism can be expressed as follows: where Q denotes query, K denotes key, V denotes value, d is the dimension of K.
In order to reduce time complexity of this algorithm, ProbSparse attention mechanism optimizes the choice of Q and K. Due to the potential sparsity of the probability distribution of self-attention, only a small number of dot products contribute to the primary attention.Hence, Q is sampled by keeping the active ones and replacing the others with uniform distribution; then, Equation (15) will be updated as follows: where Q has the same size as Q, but we replace the inactive values with mean values.ProbSparse attention mechanism is only used in self-attention in the encoder.For selfattention in the decoder, since the input sequences contain the data need to be predicted, and due to the prediction causal, future values cannot be used to calculate current values; this will result in the original number of Q and K being already reduced.Hence, there is no need to do further sampling.
The encoder has two layers and contains two self-attention blocks.It forms a basic unit of attention-convolution-maxpooling-attention.If the input sequence is excessively lengthy, it can be extended by adding multiple groups of convolution-maxpoolingattention units after the last attention block.This can effectively reduce the feature dimensions and avoid overfitting.The decoder has one layer and contains only one self-attention block.The output of this block will conduct a crossattention with the output of the encoder.

Loss Function
In order to minimize the error during model training, MSE loss is used as the loss function, which can be written as follows: where n is the number of samples.

Experiment and Analysis
Our GAT-Informer model was evaluated on two datasets: one is the PeMS04 dataset, which is widely used for traffic flow prediction, and the other one is a newly collected London dataset.We chose five models commonly used for time series prediction for performance comparison.In evaluation using the PeMS04 dataset, the impact of sports events on traffic flow will be disregarded; the target is to compare the performance of these models on solving time series prediction problem.In the London dataset, the sports events impact factor will be included.

PeMS04 Dataset
The PeMS04 dataset records data from 307 detectors on 29 roads in San Francisco Bay Area collected every 5 min between 1 January 2018 and 28 February 2018.The shape of the data is (16,992,307,3),where the three-dimensional features are flow, occupy, and speed.Only the flow feature is used in our experiments, thus constructing a (16,992, 307) traffic flow sequence.

London Dataset
The London dataset records data from 96 detectors on three motorways around London collected every 15 min between 1 August 2022 and 31 October 2022.These three motorways contain M1, M4, and M25.The M1 motorway is a major north-south motorway in England, which connects London and Leeds.The M4 motorway is one of the most important motorways in the United Kingdom, thus running from west London to southwest Wales.The M25 motorway, also known as the London Orbital Motorway, is a ring-shaped high-speed road that encircles London.In order to better study the impact of sports events on traffic flow, the section of M4 motorway and M25 motorway around London Heathrow Airport were selected.The location of detectors used in this dataset is shown in Figures 6 and 7.The dataset contains a (8740, 96) traffic flow sequence.The Laver Cup was held from 23 September 2022 to 25 September 2022 in the O2 arena in London, and this match gained a high level of global attention.Our dataset covers this time period to study the impact that sports events have on traffic flow.Figure 8 shows the traffic flow data recorded by our 2178A detector on the M4 motorway between 1 September 2022 and 30 September 2022.Obviously, there was an uptick in traffic flow around match days compared to other times of this month.Based on the definition in Section 3.2, the sports events impact factor will be calculated for this special period and used as the input of our GAT-Informer model.

Data Preprocessing
Both the PeMS04 dataset and London dataset were min-max normalized in order to accelerate the model convergence, avoid numerical instability, and enhance the model performance.The normalization can be expressed as follows:

Experimental Verification 4.2.1. Evaluation Metrics
Three commonly used metrics were chosen to evaluate the model performance during model training, including the Root Mean Squared Error (RMSE), the Mean Absolute Error (MAE), and the Mean Absolute Percentage Error (MAPE).Each of the metrics is defined as follows: where y i and ŷi represent the true traffic flow and the predicted one at time i, respectively; n is the prediction length; Y is the set of y i ; and Ȳ is the average of Y.

Experiment Settings
In the experiment, the collected data were divided into 60% for training, 20% for evaluation, and the remaining 20% for testing.For spatial information extraction, two layers of GAT with three heads of multihead attention were used.For temporal information extraction, three layers of encoder and one layer of decoder with eight heads of multihead attention were employed.The loss function of the MSE was utilized and the loss backwards from decoder outputs to the entire network.The batch size was set to be 32, and the model was optimized using the Adam optimizer with a dropout rate being set to 0.05.The learning rate was adapted from e −4 , and the number of total training epochs was 50 with early stopping.The experiments were conducted on a computer with an Intel Core i7-12800H 2.40 GHz CPU (Santa Clara, CA, USA) and a RTX 3060 GPU (NVIDIA, Santa Clara, CA, USA).During the experiment, our proposed model will be evaluated across different prediction horizons, together with baseline models, in order to have a comprehensive evaluation in short-term prediction and long-term prediction.

Experimental Results
The traffic flow prediction can be viewed as a time series forecasting problem containing multiple features.For the PeMS04 dataset, we selected five baseline models, and these models have been proven to perform well in dealing with time series problems. •

GRU:
The GRU is a variant model of RNN.It has been proven to be effective in the short-term time series forecasting problems, and the model also stands out for its ability to alleviate the gradient explosion and vanishing problem.However, the GRU is less effective in the scenario of long-term prediction.

• T-GCN:
The Integrated GCN and GRU capture both spatial and temporal correlation, and they can be used for both short-term and long-term traffic prediction [42].However, similar to recurrent-based temporal information, their extraction capacity is weak in long-term prediction.

•
Informer: The Informer model addresses the computational cost and memory usage problem of the Transformer.The model is widely used in short-term or long-term traffic prediction, but it cannot extract information from adjacent nodes.• ASTGCN: The ASTGCN model is designed for traffic prediction that excels in capturing spatial and temporal dependencies [35].The novel spatial-temporal attention mechanism enables the model to achieve high performance in traffic prediction.However, this model is constrained by the static map and cannot incorporate external factors.
• ASTGNN: The ASTGNN model is an attention-based spatial-temporal model, which stands out for the novel design of a special self-attention mechanism [43].In both of the short-term and long-term prediction scenarios, the model can achieve satisfying performance.
We deployed the above baseline models to predict traffic flow data for the next 15 min, 30 min, and 45 min horizons based on the 15 min time interval of the PeMS04 dataset.The experimental results are shown in Table 1.In the 15 min horizon, all the metrics of T-GCN were slightly better than Informer, thus indicating the necessity of graph neural network in solving the traffic flow prediction problem.At the same time, although the T-GCN performed better than the Informer in short-term predictions, its advantage could not be sustained in long-term predictions.This highlights the effectiveness of multihead attention in long-term prediction scenarios.In the 45 min horizon, the RMSE of the GAT-Informer was 13.40% smaller, the MAE was 12.94% smaller, and the MAPE was 14.54% smaller than the ASTGNN, which outperformed most of the baseline models.Moreover, as the prediction horizon lengthened, the performance of models experienced varying degrees of decay.Compared to other baseline models, the proposed GAT-Informer exhibited the least decay, thus maintaining high prediction performance.The MAE of the GRU increased by 10.48% from a 15 min prediction to 45 min prediction, while the MAE of the GAT-Informer only incresed by about 6.56%.However, the difference between the performance of the GAT-Informer and the ASTGCN is not obvious in the PeMS04 dataset experiment.
Based on the above experimental results, the performance of THe Informer, ASTGCN, and the proposed GAT-Informer were evaluated on the London dataset.They were used to predict the future traffic flow data of 15 min, 30 min, 45 min, 60 min, 90 min, and 120 min based on the 15 min time interval.For the impact factor of the sports event, ∆t defined in Equations ( 8) and ( 9) was set to 0.5 h, and t start and t end followed the schedule of the Laver Cup held in 2022, London.Sports events will affect nodes within a 10-kilometer radius.The experimental results are shown in Table 2.
According to the evaluation metrics in Table 2, GAT-Informer maintained the best performance.In long-term (45 min, 60 min, 90 min, and 120 min) traffic flow prediction, GAT-Informer's performance was only marginally attenuated; the model still maintained a high fitting degree and had the smallest error between the predict values and true values among all the baseline models.When the input of the model removed the sports events impact factor, the performance of the model decreased.In the 15 min horizon, regardless of the impact of sports events, the RMSE increased by 1.95%, the MAE increased by 6.35%, and the MAPE increased by 6.35%.In the 120 min horizon, regardless of the impact of sports events, the RMSE increased by 1.31%, the MAE increased by 1.02%, and the MAPE increased by 1.03%.Based on the model evaluation, the fusion of sports events is proven to be effective in both short-term prediction or long-term prediction.Although the model performance deteriorated with longer prediction horizons, our proposed model demonstrated the least performance degradation.Meanwhile, the GAT-Informer model considering sports events outperformed the ASTGCN in both short-term and long-term predictions, despite both models achieving similar performance on the PeMS04 dataset.Compared to other baseline models, the proposed GAT-Informer model demonstrates superior capability in extracting information from sports events, thereby enhancing prediction performance and maintaining high prediction accuracy when the prediction horizon lengthens.

Evaluation of Arrival and Depature Redundancy
A redundancy ∆t was set in Equations ( 8) and ( 9) to represent the early arrival and delayed departure of the audience.Although the start and end times of a game are predetermined, varying redundancies of ∆t can influence the impact factor curve, thereby affecting traffic flow predictions.In this case study, we evaluated the effect of different ∆t values on the prediction performance of GAT-Informer.The model was evaluated in prediction horizons of 30 min, 60 min, 90 min, and 120 min, with ∆t varying from 10 min to 40 min, and the experiment results are shown in Table 3. Table 3 shows that varying the redundancy ∆t did not lead to significant performance changes.However, in general, a redundancy of 20 min slightly improved the prediction performance compared to other settings.This suggests that the impact factor σ influences the model based on the audience arrival and departure patterns.Among different settings, a redundancy of 20 min best represents these patterns in the London dataset.

Results Visualization
In order to visualize the prediction performance of each model in Table 2, we randomly selected the prediction results of a continuous 24 h time interval by two different detectors based on a 15 min horizon and a 120 min horizon.The results are shown in Figure 9 and Figure 10, respectively.By comparing the difference between the true values and the predicted values, we can intuitively compare the performance of these three models in terms of short-term prediction and long-term prediction.
From these curves, we can find that our GAT-Informer model had a very good model fit, with a quite small error between the true values and predicted values, thus keeping basically the same curve when dealing with short-term traffic flow prediction.When dealing with long-term traffic flow prediction, although GAT-Informer did not perform as well as the short-term, it was still better than the other baseline models.Its prediction accuracy of peak traffic flow was good, but it had some errors in valley traffic flow.For traffic management, it is often more important to control the peak traffic flow, which may have a greater impact on vehicle travel time.

Conclusions
In this paper, the GAT-Informer model was proposed for long-term traffic flow prediction considering the impact of sports events.The model contains the Graph Attention layer and Informer layer.Graph Attention layer was used to fuse the information of neighbor nodes so as to fully captured the spatial features.The Informer layer was used to capture the dynamic time correlation between different times based on its ProbSparse attention mechanism.The GAT-Informer and other baseline models were tested on the PeMS04 dataset and London dataset.The experiment results showed that the GAT-Informer had better prediction performance in different horizons and metrics.
The detectors may fail temporarily due to failure, blackout, or other reasons; hence, the traffic flow data recorded by them may have missing data.Future plans include introducing fuzzy representation into the deep learning model to reduce the impact of data uncertainty and thus improve the matching of the model to various datasets.Funding: This study is supported under the RIE2020 Industry Alignment Fund-Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s), and A*STAR under its Industry Alignment Fund (LOA Award I1901E0046).

Figure 1 .
Calculation procedure for σ in one game.(a) shows the original f (z) (blue curve) and g(z) (orange curve); (b) shows the value σ after taking maximum.σ in this figure only keeps the value between 30 h before the start of the game and 30 h after the end of the game for illustration.

Figure 5 .
Figure 5. Informer layer architecture.The blue input sequences are the input of encoder, which are the same as the output of Graph Attention layer.The red input sequences are the input of decoder, and they contain two parts: label data and predict data.

Figure 9 .
Figure 9.The 15 min horizon: 24 h prediction results on a randomly selected detector.

Figure 10 .
Figure 10.The 120 min horizon: 24 h prediction results on a randomly selected detector.

Table 1 .
Average performance comparison of different baseline models for traffic flow prediction on PeMS04 dataset.

Table 2 .
Average performance comparison of different baseline models for traffic flow prediction on London dataset.
* means the dataset removes the sports events impact factor.

Table 3 .
Average performance comparison of different redundancy ∆t values.